AITopics | scene dynamic

Collaborating Authors

scene dynamic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering

Neural Information Processing SystemsJun-12-2026, 15:52:06 GMT

Despite achieving superior quality, 4DGS typically requires substantial storage and suffers from slow rendering speed. In this work, we delve into these issues and identify two key sources of temporal redundancy.

artificial intelligence, gaussian, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.61)

Add feedback

Generating Videos with Scene Dynamics

Neural Information Processing SystemsMar-17-2026, 06:33:35 GMT

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Generating Videos with Scene Dynamics

Neural Information Processing SystemsNov-21-2025, 14:12:52 GMT

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g.

generating video, name change, scene dynamic, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Reviews: Generating Videos with Scene Dynamics

Neural Information Processing SystemsOct-8-2024, 14:29:26 GMT

Overall this paper is very clearly laid out, and it is very easy to follow. Given that the authors are basing much of their method on existing methods for image generation, the novelty of the method lies in the way they adapted such methods to generate video. It is important to emphasize that I am not familiar with any other papers that attempt to do this (and the authors also didn't seem to be able to find other such papers). The problem with video, unlike images is that low frequencies are not only spanning space, but also time. Therefore, when generating video, typical methods will attempt to generate the temporal low frequencies first, resulting in very jarring outputs.

frequency, generating video, scene dynamic, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)
Information Technology > Artificial Intelligence > Vision (0.60)

Add feedback

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

Lu, Guanxing, Zhang, Shiyi, Wang, Ziwei, Liu, Changliu, Lu, Jiwen, Tang, Yansong

arXiv.org Artificial IntelligenceMar-13-2024

Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots. Conventional robotic manipulation methods usually learn semantic representation of the observation for action prediction, which ignores the scene-level spatiotemporal dynamics for human goal completion. In this paper, we propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction. Specifically, we first formulate the dynamic Gaussian Splatting framework that infers the semantics propagation in the Gaussian embedding space, where the semantic representation is leveraged to predict the optimal robot action. Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction. We evaluate our ManiGaussian on 10 RLBench tasks with 166 variations, and the results demonstrate our framework can outperform the state-of-the-art methods by 13.1\% in average success rate.

arxiv preprint arxiv, dynamic gaussian splatting framework, world model, (11 more...)

arXiv.org Artificial Intelligence

2403.08321

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.54)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.39)

Add feedback

Deep Learning Method for Cell-Wise Object Tracking, Velocity Estimation and Projection of Sensor Data over Time

Braun, Marco, Luszek, Moritz, Meuter, Mirko, Spata, Dominic, Kollek, Kevin, Kummert, Anton

arXiv.org Artificial IntelligenceJun-18-2023

Current Deep Learning methods for environment segmentation and velocity estimation rely on Convolutional Recurrent Neural Networks to exploit spatio-temporal relationships within obtained sensor data. These approaches derive scene dynamics implicitly by correlating novel input and memorized data utilizing ConvNets. We show how ConvNets suffer from architectural restrictions for this task. Based on these findings, we then provide solutions to various issues on exploiting spatio-temporal correlations in a sequence of sensor recordings by presenting a novel Recurrent Neural Network unit utilizing Transformer mechanisms. Within this unit, object encodings are tracked across consecutive frames by correlating key-query pairs derived from sensor inputs and memory states, respectively. We then use resulting tracking patterns to obtain scene dynamics and regress velocities. In a last step, the memory state of the Recurrent Neural Network is projected based on extracted velocity estimates to resolve aforementioned spatio-temporal misalignment.

artificial intelligence, machine learning, segmentation, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ITSC55140.2022.9921760

2306.06126

Country: Europe > Germany (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unifying (Machine) Vision via Counterfactual World Modeling

Bear, Daniel M., Feigelis, Kevin, Chen, Honglin, Lee, Wanhee, Venkatesh, Rahul, Kotar, Klemen, Durango, Alex, Yamins, Daniel L. K.

arXiv.org Artificial IntelligenceJun-2-2023

Leading approaches in machine vision employ different architectures for different tasks, trained on costly task-specific labeled datasets. This complexity has held back progress in areas, such as robotics, where robust task-general perception remains a bottleneck. In contrast, "foundation models" of natural language have shown how large pre-trained neural networks can provide zero-shot solutions to a broad spectrum of apparently distinct tasks. Here we introduce Counterfactual World Modeling (CWM), a framework for constructing a visual foundation model: a unified, unsupervised network that can be prompted to perform a wide variety of visual computations. CWM has two key components, which resolve the core issues that have hindered application of the foundation model concept to vision. The first is structured masking, a generalization of masked prediction methods that encourages a prediction model to capture the low-dimensional structure in visual data. The model thereby factors the key physical components of a scene and exposes an interface to them via small sets of visual tokens. This in turn enables CWM's second main idea -- counterfactual prompting -- the observation that many apparently distinct visual representations can be computed, in a zero-shot manner, by comparing the prediction model's output on real inputs versus slightly modified ("counterfactual") inputs. We show that CWM generates high-quality readouts on real-world images and videos for a diversity of tasks, including estimation of keypoints, optical flow, occlusions, object segments, and relative depth. Taken together, our results show that CWM is a promising path to unifying the manifold strands of machine vision in a conceptually simple foundation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.01828

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Generating Videos with Scene Dynamics

Vondrick, Carl, Pirsiavash, Hamed, Torralba, Antonio

Neural Information Processing SystemsFeb-14-2020, 06:13:10 GMT

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background. Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images. Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision, suggesting scene dynamics are a promising signal for representation learning. We believe generative video models can impact many applications in video understanding and simulation.

generating video, scene dynamic

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Generating Videos with Scene Dynamics - MIT

#artificialintelligenceSep-12-2016, 23:47:06 GMT

Learning models that generate videos may also be a promising way to learn representations. For example, we can train generators on a large repository of unlabeled videos, then fine-tune the discriminator on a small labeled dataset in order to recognize some actions with minimal supervision. We can also visualize what emerges in the representation for predicting the future. While not all units are semantic, we found there are a few hidden units that fire on objects which are sources of motions, such as people or train tracks. Since generating the future requires understanding moving objects, the network may learn to recognize these objects internally, even though it is not supervised to do so.

artificial intelligence, generating video, machine learning, (3 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback